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Abstract 

We are given a graph G with n vertices, where a random subset of k vertices has been 
made into a clique, and the remaining edges are chosen independently with probability i. This 
random graph model is denoted G(n, \,k). The hidden clique problem is to design an algorithm 
that finds the fc-clique in polynomial time with high probability. An algorithm due to Alon, 
Krivelevich and Sudakov uses spectral techniques to find the hidden clique with high probability 
when k = c^fn for a sufficiently large constant c > 0. Recently, an algorithm that solves the 
same problem was proposed by Feige and Ron. It has the advantages of being simpler and more 
intuitive, and of an improved running time of 0(n 2 ). However, the analysis in the paper gives 
success probability of only 2/3. In this paper we present a new algorithm for finding hidden 
cliques that both runs in time 0(n 2 ), and has a failure probability that is less than polynomially 
small. 

1 Introduction 

A clique in a graph G is a subset of its vertices any two of which are connected by an edge. The 
problem of determining the size of the maximum clique in a graph is known to be NP-complete 
[20] . It has also been proved [TTJ [5J 0] that assuming P ^ NP, there exists a constant b > for 
which it is hard to approximate the size of the maximum clique within a factor of n b . Therefore, it 
is natural to investigate the hardness of this problem in the average case. 

The Erdos Renyi random graph model, also denoted G(n, |), is a probability measure on graphs 
with n vertices. In this model, a random graph is generated by choosing each pair of vertices 
independently with probability | to be an edge. It is known that with probability tending to 1 as n 
tends to infinity, the size of the largest clique in G(n, i) is (2+o(l)) log n. There exists a polynomial 
time algorithm (see for example [16]) that finds a clique of size (1 + o(l)) log n in G(n, ^) with high 
probability, but even though in expectation G(n, |) contains many cliques of size (1 + e) logra for 
any fixed < e < 1, there is no known polynomial time algorithm that finds one. It is plausible to 
conjecture that this problem is computationally hard, and this hardness has been used in several 
cryptographic applications [221 US] • 

Finding a large clique may be easier in models where the graphs contain larger cliques. Define, 
therefore, the hidden clique model, denoted by G(n, h,k). In this model, a random n vertex graph 
is generated by randomly choosing k vertices to form a clique, and choosing every other pair of 
vertices independently with probability \ to be an edge. Jerrum [TS] and Kucera [23] suggested this 
model independently and posed the problem of finding the hidden clique. When k > coV n l°g n 
for some sufficiently large constant Co, Kucera observed [231 §Thm. 6.1] that the hidden clique can 
be found with high probability by taking the k highest degree vertices in the graph. For k = c^fn, 



1 



there is an algorithm due to Alon, Krivelevich and Sudakov [3j that finds the hidden clique with 
high probability when c is sufficiently large using spectral techniques. In a more recent paper j!4j . 
Feige and Ron propose a simple algorithm that runs in time 0(n 2 ) and finds the hidden clique 
for k = c^/n with probability at least 2/3. In this paper we present a new algorithm that has the 
advantages of both algorithms, as it runs in time 0(n 2 ), and fails with probability that is less than 
polynomially small in n. The algorithm has three phases, and it uses two parameters: < a < 1 
and ft > 0. In the first phase, we iteratively find subgraphs of the input graph G. Denote these 
subgraphs by G = G Q D Gi D G2 D • • • ■ Given Gi, we define Gi + \ as follows: Pick a random 
subset of vertices Si C V(Gi) that contains each vertex with probability a. Define Vi as the set 

that contains all the vertices in Gi that are not in Si, that have at least ^\Si\ + ft 2 ' neighbors 
in Si, namely 

Vi = {v G V(Gi) \ Si : \{u G 5, : (u,v) G E{G{)}\ > ±\Si\ + (3^} . 

Define Gj+i to be the induced subgraph of Gi containing only the vertices in Vi. We choose a and 
ft in such a way that the relative size of the hidden clique grows with each iteration. We repeat 
the process t times, until we are left with a subgraph where the hidden clique is large enough so 
we can continue to the second phase. A logarithmic number of iterations is enough. For the exact 
way of choosing a, ft and t, see the proof of Lemma 12.101 

In the second phase, we find K, the subset of the hidden clique contained in Gt- This is done 
by estimating the number of clique vertices in Gt by kt, then defining K' as the set of kt largest 
degree vertices in Gt, and letting K contain all the vertices in Gt that have at least ^ neighbors 
in K' . In the third phase of the algorithm, we find the rest of the hidden clique using K. This is 
done by letting G' be the induced subgraph of G containing K and all its common neighbors. Let 
K* be the set of the k largest degree vertices in G' . Then K* is the set returned by the algorithm 
as the candidate for the hidden clique. 

Theorem 1.1. If c > cq then there exist a, ft such that, given G G G{n,^,Cy/n), the probability 
that K* = K*(a,ft) is the hidden clique is at least 1 — e - ( ne °) f or some Eq = £o(c). 

Numerical calculations show that cq is close to 1.65. For a mathematical definition of cq see 
Definition 12. 2[ A refinement of the algorithm that works with high probability for all c > 1.261 is 
presented in Sec. 13.11 

1.1 Related Work 

Since [3], there have been many papers describing algorithms that solve various variants of the 
hidden clique problem. In [12] an algorithm for finding hidden cliques of size $l(y/n) based on the 
Lovasz theta function is given, that has two advantages. The first is being able to find the clique 
also in a semi-random hidden clique model, in which an adversary can remove edges that are not 
in the clique, and the second is being able to certify the optimality of its solution by providing an 
upper bound on the size of the maximum clique in the graph. 

McSherry [25] gives an algorithm that solves the more general problem of finding a planted 
partition. In the random graph model described there, we are given a graph where the vertices are 
randomly partitioned into m classes, and between every pair of vertices where one is in class i and 
the other in class j there is an edge with probability pij. With the appropriate parameters, this 
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model can be reduced both to the hidden clique model and to the hidden dense graph model that 
we describe in Sec. 13.21 For both these cases, the result is a polynomial time algorithm that finds 
the hidden clique (dense graph) with high probability for k = c^fn. 

Several attempts have been made to develop polynomial time algorithms for finding hidden 
cliques of size k = o(y/n), so far with no success. For example, Jerrum [18] described the Metropolis 
process and proved that it cannot find the clique when k = o(sfn). Feige and Krauthgamer |13] 
explain why the algorithm described in [12] fails when k = o(y / n). Frieze and Kannan [15] give 
an algorithm to find a hidden clique of size k = fi (n 1 / 3 log 4 n) , however, the algorithm maximizes 
a certain cubic form, and there are no known polynomial time algorithms for maximizing cubic 
forms. In Sec. 12.1.31 we give an algorithm that finds the hidden clique when we are given a small 
part of it by an oracle or an adversary. We prove, that for any k = oo (log n log log n), knowing only 
log n + 1 vertices of the hidden clique enables us to find the rest of them with high probability. For 
smaller k's, logn + 1 is not enough, but (1 + e) logn is. 

There are many problems in different fields of computer science that are related to the hidden 
clique problem. Among others, there are connections to cryptography, testing and game theory. 
For connections to cryptography, see for example [22] where an encryption scheme based on hiding 
an independent set in a graph is described or [19] where the function whose input is a graph G and 
a set K of k vertices and whose output is G with a clique on K is proposed as a one way function 
for certain values of k. For connections to testing, see [2] where Alon et al. prove that if there is 
no polynomial time algorithm to find hidden cliques of size t > log 3 n then there is no polynomial 
time algorithm that can test fc-wise independence of a distribution even when given a polynomial 
number of samples from it, for k = 0(log n). For connections to game theory, see [T7], where Hazan 
and Krauthgamer prove that if there is a polynomial time algorithm that finds a Nash equilibrium 
of a two player game whose social-welfare is close to the maximum, then there is a randomized 
polynomial time algorithm that finds the hidden clique for k = O(logn). The hidden clique model 
is also related to the planted-SAT model [TJ [21] and some models in computational biology [6]. 

2 Proof of Thm. O 

Throughout the paper we use the following notations. 

Notation 2.1. Given a graph G = (V,E), for every v E V and S C V we denote by ds(v) the 
number of neighbors v has in S. Formally, 

d s (v) = \{ueS:(u,v)eE}\ . 

We abbreviate dy(v) by d(v). 

Notation 2.2. Let f(x) denote the Gaussian probability density function <p(x) = '^ =e ~ x2 ^ 2 ■ We 
denote by 3>(x) the Gaussian cumulative distribution function <J>(x) = tp(t)dt, and <£(x) = 
1 - $(s). 

Notation 2.3. All logarithms in the paper are base 2. 

Notation 2.4. We use the shorthand "whp(f(n))" to mean: "with probability at least 1 — f(n)". 
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Definition 2.1. Given < a < 1 and /3 > 0, toe define 

r = (1 - a)¥(/3) 

p = (1 - a)¥(/3 - c^a) • 

Definition 2.2. For every a,/3, denote the minimal c for which p > y/r by c(a,/3). Define cq as 
the infimum of c(a, (3) for < a < 1 and (3 > 0. 

Definition 2.3. Define n = no,ni,n2, ■ ■ ■ and k = ko, k\, • • ■ by ni = r*n and fcj = //£;. Define 
also n = ho, hi,... and k = ko,ki,... to be the actual sizes of Gi and the hidden clique in Gi 
respectively when running the algorithm. 



2.1 Proving the correctness of the algorithm 

In order to prove the correctness of the algorithm, we examine each of the three phases of the 
algorithm. First, we prove that in every iteration, with high probability hi,ki are close to ni,ki 
respectively. We do this by first proving that in every iteration the graph Gi is a copy of G(hi, ^, ki), 
and therefore it is enough to prove that given a graph in G(n, \,k), with high probability |Vb| is 
close to rn and |Vb H K\ is close to pk. Here, the high probability should be high enough to remain 
high even after t iterations. Next, we prove that with high probability K is a subset of the hidden 
clique. Last, we prove that with high probability K* is the hidden clique. 



2.1.1 Proving the correctness of the first phase of the algorithm 

Lemma 2.4. For every i > 0, the graph Gi defined in the i 'th iteration of the algorithm is a copy 
ofG(hi, 

Proof. We prove this by induction. Assume that Gi is a copy of G(hi, i, ki). Consider the following 
equivalent way of generating G(hi, \,ki): First, pick the ki hidden clique vertices. Then pick the 
set Si. Then pick all the edges between V{G; L ) \ Si and Si. At this point, we still need to pick 
the edges in Si and in V{Gi) \ Si, but we already have enough information to find Vi, which is the 
vertex set of G^+i- Since we can find the vertices of Gj+i before exposing any of the edges in it, it 
is a copy of G(h i+1 , \, k i+1 ). □ 

Lemma 2.5. For every < e\ < \ and < £2 < \, the set So satisfies \ \So\ — an\ < 0(n l ~ £l ) 
and \ \Sq n K\ - ak\ < 0{k^^) whpfe- ^ 1 ' 2 " 1 ^ + e" e ( fcl " 2£2 ) ). 

Proof. Follows directly from Thm. IA.31 by setting t = n 1_£l for the bound on |So| an d t = k 1 ^ 62 
for the bound on 1 5*0 n K\. □ 

Lemma 2.6. For every < e± < \ and < £2 < \, the set Vq satisfies \ \Vq\ — rn\ < 0(n 1_£l ) and 
Vr\K\-pk < 0(A; 1 - £2 ) whp(e- ^ 2 ^ + e" e ( fcl ~ 2£2 ) ). 

Proof. Assume that the events |So| = (1 + o(l))an and \Sq n K \ = (1 + o(l))ak both occur. By 
Lemma 12.51 this happens with high probability. We can now apply Cor. IA.41 twice. 
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For the vertices in (V \ So) \ K, the result follows directly from Cor. IA.4l bv setting e = e\. For 
v £ (V \ So) n K, having ds (v) > \an + (i^p- is equivalent to having 

d So \K{v) > \oc{n -k) + \{P- c\/a)y^\/«( n = k ) ■ 
So setting e = £2 in Cor. IA.41 gives that 

P(||Vb nK\- p'k\ < 0{k l ~ £2 )) > 1 - e - e ( fcl ~ 2£2 > , 

where p' = (1 — a)$((/3 — c v^)v/^i) • But the difference between p and p' is of order which 
means that the result holds for 1 1 Vo H if| — p/c j as well. □ 

Remark 2.1. In order to get a success probability that tends to 1, we need to bound the sum of 
the probabilities of failing in each iteration by o(l). We refer the reader to Sec. \2.2\ for a detailed 
analysis of the failure probability of the algorithm. 

2.1.2 Proving the correctness of the second phase of the algorithm 

We start by bounding the probability that a hidden clique of size k contains the k largest degree 
vertices in the graph. 

Lemma 2.7. Let G G G{n,\,k). Then whp(e~( k2 ^/ 8n - lo s n ~°( 1 )) ) ) the clique vertices are the k 
largest degree vertices in the graph. Formally, if we denote the hidden clique by K, and the set of 
k largest degree vertices by M , then 

¥(\K \ M\ > 0) < e -(fc 2 /8n-io g n-0(i)) _ 

Proof. Define x = jk. Then by Thm. IA.3I 

P(3v K : d(v) >\n + x)< riP(B(n, \) > \n + x) < riP(\B(n, |) - |n| > x) < 2ne" fc2/8n . 

On the other hand, 

F(3v £ K : d(v) < \n + x) < k¥(B(n - k,\) < \{n - k) + x - \k) 

< k¥(\B(n - k, I) - \{n - k)\ > x) < 2ke~ k2 / 8n . 

Therefore, the probability that there exist a non-clique vertex v and a clique vertex u such that 
d(u) < d{v) is bounded by 2(n + k)e~ k ^ 8n . □ 

Corollary 2.8. If the algorithm does t iterations before finding K and succeeds in every iteration, 

2 

then whp(e- @ ^^), K is a subset of the original hidden clique. 

Proof. The algorithm estimates kt, the number of hidden clique vertices in Gt, by kt = p t k. If 
the input graph has n vertices and a hidden clique of size k = Cy/n, and all the iterations are 
successful, then \kt — kt\ < 0{k\~ Sl ). Recall that K' is defined as the kt largest degree vertices in 

Gt- By Lemma [2771 whp(e 1 ^ n ') the hidden clique vertices have the largest degrees in Gt, so if 
kt < kt then K' contains all the hidden clique vertices in Gt plus 0{k\~ Sl ) non-clique vertices, and 
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if kt > kt, then K' contains all the hidden clique vertices in Gt except for 0(k t £2 ) of them. In both 

cases, every hidden clique vertex in Gt has at least kt — 0(k t 62 ) neighbors in K' . Whp(e w*n ') 
every non-clique vertex in Gt has at most ^ neighbors in K' (this follows from Thm. IA.3I and 

the union bound). Thus, if we define K = {v £ V(Gt) : d,K>(v) > then whp(e _e> ^ K 

contains every clique vertex in Gt, and no non-clique vertex in Gt- □ 



2.1.3 Proving the correctness of the third phase of the algorithm 

In order to prove that K* is the hidden clique with high probability, we prove a more general 
Lemma. We prove that if an adversary reveals a subset of the clique that is not too small, we can 
use it to find the whole clique. 

Lemma 2.9 (Finding hidden cliques from partial information). We are given a random graph 
G € G(n, ^, k), and a subset of the hidden clique K C K of size s. Suppose that either 

(a) k = 0(log n log log n) and s > (1 + e) log n for some e > 0, or 

(b) k > w (log n log log n) and s > log n + 1. 

Let G' denote the subgraph of G induced by K and all its common neighbors, and define K* to be 
the k largest degree vertices of G' . Then for every < £3 < \, whp(e~®( s lo s fc + lo g n ) _|_ e -©( fc £3 ) J > 
K* = K. 

Proof. Consider an arbitrary subset of K of size s. The probability that its vertices have at least 
lo non-clique common neighbors can be bounded by E^o^ 2 ^- Taking union bound over all 
subsets of size s of K gives that the probability that there exists a subset with at least Iq non-clique 
common neighbors is bounded by 

n—k n—k 
1=Iq 1=Iq 

Therefore, this is also a bound on the probability that the set K has at least Iq non-clique neighbors. 
So we have 

P(|^(G")| > k + l Q ) < 2 l °S n + sl °Sk+l (lo g n-s) _ 

By our assumptions on s, we know that logn — s is negative. Therefore, we can take Iq = 
^'"s-bgn 8 ^ anc ^ that whp(2 _slog?c ~ logn ), there are at most Iq non-clique vertices that are adja- 
cent to all of K. Recall that the probability that there exists a non-clique vertex in G with more than 
I + /e 1_£3 neighbors in the hidden clique is bounded by e -0 ^ £3 \ Therefore, whp(e _e ^ fc ~ 3 - > ), 
the degrees of all the non-clique vertices in G' are at most 77 + k l ~ £ ' A +Iq. If s and k are such that 
Iq = o(k), this value is smaller than k — 1. On the other hand, all the clique vertices in G' have 
degree at least k — 1, so the clique vertices have the largest degrees in G' . 

If k = oj (log n log log n) then letting s = log n + 1 gives Iq = 2 ( log n + log n log k + log k J . Clearly, 
log n + log k = o(k). To see that log re log /c = o(k), denote k = log nf(n) where f(n) = w(loglogn). 
Then lognlog/c = log n( log logn + log (/(n))). Clearly, lognlog(/(n)) = o(logn/(n)), and from 
the definition of f(n) we also have logn log logn = o(logn/(n)). 

If k < 0(log n log log n) , then letting s > (1 + e) log n for some small e > is enough, since then 

J 2 + HG±el lo g fc = o( £). □ 
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2.2 Bounding the failure probability 

Lemma 2.10. For every c > cq, there exist < a < 1 and ft > such that if we define a = — logT ^ 
and b = — log p 2 , then for every Eq < \, the failure probability of the algorithm is at most Q~®( n£ °) . 

Proof. In order for the probability proven in Cor. 12.81 to tend to 0, we need r and p to satisfy 
-j= > 1. From Definition 12.21 we know that for c > cq there exist a, (3 that satisfy this inequality. 
Numerical calculations show that cq is close to 1.65. The values of a and /3 for which c(a, j3) = 1.65 
are a = 0.3728 and f3 = 0.72. For these values, we get r « 0.14787 and p « 0.38455, and 
4= « 1.00003. 

Let the number of iterations be t = £4 log ^ n for some < ea < ± We use the union bound to 

log £1 a 

estimate the failure probability during the iteration phase of the algorithm. By Lemmas 12.51 and 
\2M this probability is at most ^*=o (e" e(n *~ E1) + e~ e( - k i~ 2 )), which can be upper bounded by 

e _e(n( 1 - 2e i)( 1 - £ 4-)) e _e(ni (1 - 2£ 2)( 1 - E 4i')) 

By Cor. I2.8| the failure probability in the step of finding K is bounded by e - ®^ 4 ). Finally, if t is 
as defined above, then assuming the first two phases succeed, \K\ > p l k — o(p t k) = k 1 ~ be4 (l — o(l)) 
(notice that b = a — 1 so £4 <^ implies that 1 — bE^ > 0). K is large enough so that we can use 
Lemma |2.9| to conclude that the probability of failing in the third phase is at most 

e -e(ni (1 - £ 4i>) logn ) e _ e(A .i-2 £3) 

For any choice of < E\ , E2 < \ and < £4 < - , denote 

£ = min {e a , (1 - 2ei)(l - e 4 o), |(1 - 2e 2 )(l - £4^)} , 

and take £3 = 1 ~^ £Q (notice that £3 > because eo < \)- With these parameters, the failure 
probability of the whole algorithm is bounded by e -e> ( ne °). □ 



3 Refinements 

3.1 A variation of this algorithm that works for smaller cliques 

The reason our algorithm works is that the clique vertices in V{Gi)\Si have a boost of around ^aki 
(which is Cy/a times the standard deviation) to their degrees, so this increases the probability that 
their degree is above the threshold. If we could increase the boost of the clique vertices' degrees (in 
terms of number of standard deviations) while still keeping the graph for the next iteration random, 
then we would be able to find the hidden clique for smaller values of c. One way to achieve this, is 
by finding a subset S{ of Si, that has 777,4 vertices (7 < a) and Ski clique vertices. If we count just 
the number of neighbors the vertices in V(Gi) \ Si have in Si, then the clique vertices have a boost 
of around ^5ki to their degree, which is c-J= times the standard deviation. 

The subset of Si that we use in this variation is the set of all vertices v £ Si that have dg t (v) > 

+ V 2* ' f° r some V > 0- Since these degrees are not independent we cannot use the same 
concentration results we used before, so we first prove the following concentration result. 
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Lemma 3.1. Let G G G(n, ^) and a, d > 0. Define a random variable 

X = \{v G V(G) : d(v) > \n + a^}\ . 
Then for every < e' < \ it holds that 



P(|X-¥(a)rc| > c'n 1 ^') < 2e- 7rc ' 4rll " 4£, / 32 
Proof. For every v G V(G) define a random variable 



X v 



1 d(u)>in + a^ 
otherwise 



Then X = ^X v . By Cor. IA.2I we have \<&(a)n — EX\ < c^/n for some constant c. To prove that 
X is concentrated around its mean we define additional random variables. Let e > to be defined 
later, and define three thresholds: 



t± = \n + (a — e)^ , ti = \n-\- a^- , and £3 = + (a + e) 2 
For every t> G ^(G) define 

r < ii 

2(d(u)-ti) 



1 d(v) > t 2 



h < d{v) <t 2 > G„ = < 



r d(u) < * 2 

1 d(«) > t 3 



Define F = Y JV F V and G = ^ G„. For every v G V, we bound EF„ - EX„ and EX„ - EG V . 

t 2 *2 



E 



F„-EX„ = 2""^^|)(n)< 2 -«^(n)<£^ 2 - ( |)<_ fe(l + o(I)) (1 



-n 2(i-tQ 

i=t\ i=t\ 

where the last two inequalities follow from the fact that (n) is the maximal binomial coefficient, 

2 

and from Stirling's approximation (see, for example [1]): n\ = \/2vrn(^) n (l + G(~))- Repeating 
this calculation for MX V — ~EG V gives 



*3 



E 



X, - EG„ = JT« £ (1 - ^) (?) < ^(1 + 0(1)) • 



(2) 



i=ti 



From ([T]) and ([2]) we have 

P(|X-EX| > An) < P(F-EF > (A - + P(G - EG < — (A - ^)n) . 

Thus, we need to calculate the concentration of F and G. Both are edge exposure martingales with 

2 

Ey/n' 



Lipschitz constant —i=. Therefore, by Azuma's inequality (see, for example |24j ) we get: 



(A f=) 2 n 2 

V /2lr 



e 2 (A--fc) 2 n/4 



P(F - E_F > (A - ^)n) +P(G-EG < - (A - ^)n) < 2e '^K^ 2 < 2e " 
Choosing A = c'n~ £ ' and e = i-v/^c'?! -5 ' concludes the proof. 



□ 



S 



Lemma 3.2. Let S = {v G S : d So (v) > ^\S \ + r] *^ }. Then for every < E\ < \, 
whp(e~®( nl 4ei )j we have \\So\ — jn\ < 0{n l ~ £l ), where 7 = a&(i]). Furthermore, for every 
< e 2 < \, whp(e,~ & ^ 2£2 ^ ) we have \\S n K\ - 5k\ < O^k 1 ^ 62 ), where 5 = al>{rj - Cy/a). 

Proof By LemmaESl whp( e - ( nl ~ 2£1 ) + e - e ( fel ~ 2e2 )) the size of So is (l + o(l))cm and the number 
of clique vertices in So is (l + o(l))ak. The first part of the Lemma follows directly from Lemma [3, II 
by setting sf = e±. For the second part of the Lemma, consider a clique vertex v £ So. Having 
ds (v) > \an + rf^p- is equivalent to having 

ds \ic(v) > §a(n -k) + \{r] - cy/a) \f^\/ 'a(n - k) . 

Thus, setting e = e 2 in Cor. \EM gives that whp(e- ( fcl ~ 2e2 )), ||5 n K\ - S'k\ < Oik 1 ' 62 ), where 



5' = a^[(r] — Cy/a)^J ^r^)- The difference between 5 and 5' is of order -^=, which means that the 
result holds for N^o H K | — 8k\ as well. □ 

Theorem 3.3. Consider the variant of the algorithm, where Vi is defined by 

Vi = {ve V(Gi) \ S t : d § .(v) >±\Si\ + 

with Si, j as defined in Lemma \ 3.SX If c > 1.261 then there exist a,/3,rj for which running the 
variant of the algorithm described above on a random graph in G(n, |, Cyfn) finds the hidden clique 
whp(e~®( n " ^ ) for some Eq = £q{c). 



Proof. We follow the proof of Thm. 11.11 with two differences. The first is that we use Lemma 
instead of Lemma 12.51 which implies that instead of demanding e\ < \ we demand e\ < -j. The 
second is that in Lemma [2.6l and everything that follows we use a different definition for p. Since now 
the clique vertices' degree boost is c-A= times the standard deviation, we define p = (1— a)$(/3— ^) . 
Next, for every a, /3, rj, we denote by c(a, /3, rj) the minimal c for which -*j= > 1. Denote the infimum 
of c(a,/3,r]) by c*. Numerical calculations show that c* is close to 1.261. The values of a, /3 and 
T) for which which c{a,j3,rj) = 1.261 are a = 0.8, j3 = 2.3 and 77 = 1.2. For these values, we get 
r 0.0021448 and p » 0.046348, and 4= « 1.0008. □ 

3.2 Finding hidden dense graphs in G(n,p) 

Define the random graph model G(n,p, k, q) for < p < q < 1. Given a set of n vertices, randomly 
choose a subset K of k vertices. For every pair of vertices (u, v), the edge between them exists with 
probability p if at least one of the two vertices is in V \ K, and with probability q if they are both 
in K. The model discussed in the previous sections is equivalent to G(n, \ ,Cyfn, l). 

Next, we define a generalization of the algorithm from the previous section. This algorithm has 
the same three phases as before. In the first phase, the definition of Vi is different. V% is defined as 
the set of vertices with at least p\Si\ + (3\/p(l — p)\Si\ neighbors in Si- Namely, 

V = {v € V(Gi) \ Si : ds(v) > p\Si\ + py/p(l-p)\Si\} . 

Define p' = (1 — — c\/a—j===) . In the second phase, after t iterations, define K' to be the 

set of p n k largest degree vertices in Gt, and let K contain all the vertices in Gt that have at least 
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\{p + q) neighbors in K'. In the third phase, let K' be the set of vertices containing K and all the 
vertices in G that have at least \{p-\- q)\K\ neighbors in K. Let K* be the set of all vertices in G 
that have at least \{p + q)k neighbors in K' . The algorithm returns K* as the candidate for the 
dense graph. 

Theorem 3.4. If c > c ~ P q- P ~ then there exist < a < 1 and (3 > /or which given a graph 
G G G(n,p, Cy/n, q), the above algorithm finds the hidden dense graph whp(e~®( n£ "^ ) for sq = £o( c )- 

To prove Thm. 13.41 as in the hidden clique case, we first prove the correctness of each of the 
phases of the algorithm, and then bound the failure probability. To prove the correctness of the 
first phase, we prove Lemmas IB. II and IB. 21 which are analogous to Lemmas 12.41 and 12.61 To prove 
the correctness of the second phase, we prove Lemma IB. 31 and Cor. IB. 4\ which are analogous to 
Lemma 12.71 and Cor. 12.81 To prove the correctness of the third phase we prove Lemma IB. 51 The 

failure probability follows as in Lemma 12.101 by noticing that substituting c P - for c in the 
definition of p' gives the exact definition of p. 



4 Discussion 



Our results bring up some interesting questions for future research. For example, one of the 
advantages of the algorithm presented here is a failure probability that is less than polynomially 
small in the size of the input. Experimental results shown in |14j suggest that the failure probability 
of the algorithm described there may also be o(l). Whether the analysis can be improved to prove 
this rigorously is an interesting open question. One can also ask whether the analysis in [3] can be 
improved to show failure probability that is less than polynomially small. 

Aside from the most interesting open question of whether there exists an algorithm that finds 
hidden cliques for k = o(y / n), one can ask about ways to find hidden cliques of size k = Cy/n as c 
gets smaller. In [3j, Alon, Krivelevich and Sudakov give a way to improve the constant for which 
their algorithm works, at the expense of increasing the running time. This technique can be used 
for any algorithm that finds hidden cliques, so we describe it here. Pick a random vertex v £ V, 
and run the algorithm only on the subgraph containing v and its neighborhood, v is a clique vertex, 
then the parameters of the algorithm have improved, since instead of having a graph with n vertices 
and a hidden clique of size Cy/n we now have a graph with ^ vertices and a hidden clique of size 
Cy/n. The expected number of trials we need to do until we pick a clique vertex is 0{y/n). This 
means that if we have an algorithm that finds a hidden clique of size Cy/n, where c > cq, we can 
also find a hidden clique for c > ^jj=, while increasing the running time by a factor of yjri. If we 
wish to improve the constant even further, we can pick r random vertices and run the algorithm 
on the subgraph containing them and their common neighborhood. This gives an algorithm that 
works for constants smaller by up to a factor of 2 r / 2 than the original constant, at the expense of 
increasing the running time of the algorithm by a factor of n r / 2 . 

We have described a sequence of algorithms whose running times increase by factors of y/n. It 
is not known whether the constant can be decreased if we can only increase the running time by a 
factor smaller than y/n. 

Question 1. Given an algorithm that runs in time 0(n 2 ) and finds hidden cliques of size Cy/n for 
any c > cq, is there an algorithm that runs in time 0[[n 2+£ ), where e < ^, and finds hidden cliques 
of size Cy/n where c < cq? How small can c be as a function of e? 



10 



References 

[1] M. Abramowitz and I. A. Stegun. Handbook of Mathematical Functions with Formulas, Graphs, 
and Mathematical Tables. Dover Publications, New York, 1964. 

[2] N. Alon, A. Andoni, T. Kaufman, K. Matulef, R. Rubinfeld, and N. Xie. Testing k-wise and 
almost k-wise independence. In STOC, pages 496-505, 2007. 

[3] N. Alon, M. Krivelevich, and B. Sudakov. Finding a large hidden clique in a random graph. 
Random Structures and Algorithms, 13:457-466, 1998. 

[4] S. Arora, C. Lund, R. Motwani, M. Sudan, and M. Szegedy. Proof verification and hardness 
of approximation problems. In FOCS, pages 14-23, 1992. 

[5] S. Arora and S. Safra. Probabilistic checking of proofs: A new characterization of np. J. ACM, 
45(1):70-122, 1998. 

[6] A. Ben-Dor, R. Shamir, and Z. Yakhini. Clustering gene expression patterns. Journal of 
Computational Biology, 6(3/4) :281-297, 1999. 

[7] E. Ben Sasson, Y. Bilu, and D. Gutfreund. Finding a randomly planted assignment in a 
random 3CNF, 2002. manuscript. 

[8] A. C. Berry. The accuracy of the Gaussian approximation to the sum of independent variates. 
Transactions of the american mathematical society, 49(1):122-136, 1941. 

[9] R. Durrett. Probability: Theory and Examples. Cambridge University Press, fourth edition, 
2010. 

[10] C. G. Esseen. On the Liapunoff limit of error in the theory of probability. Arkiv for matematik, 
astronomi och fysik, A28:l-19, 1942. 

[11] U. Feige, S. Goldwasser, L. Lovasz, S. Safra, and M. Szegedy. Approximating clique is almost 
NP-complete (preliminary version). In FOCS, pages 2-12, 1991. 

[12] U. Feige and R. Krauthgamer. Finding and certifying a large hidden clique in a semirandom 
graph. Random Struct. Algorithms, 16(2): 195-208, 2000. 

[13] U. Feige and R. Krauthgamer. The probable value of the lovasz-schrijver relaxations for 
maximum independent set. SIAM J. Comput., 32(2):345-370, 2003. 

[14] U. Feige and D. Ron. Finding hidden cliques in linear time. In AOFA, 2010. 

[15] A. M. Frieze and R. Kannan. A new approach to the planted clique problem. In FSTTCS, 
pages 187-198, 2008. 

[16] G. Grimmett and C. McDiarmid. On colouring random graphs. Math. Proc. Cam. Phil. Soc, 
77:313-324, 1975. 

[17] E. Hazan and R. Krauthgamer. How hard is it to approximate the best nash equilibrium? In 
SODA, pages 720-727, 2009. 



11 



[18] M. Jerrum. Large cliques elude the metropolis process. Random Structures and Algorithms, 
3:347-359, 1992. 

[19] A. Juels and M. Peinado. Hiding cliques for cryptographic security. Des. Codes Cryptography, 
20(3):269~280, 2000. 

[20] R. M. Karp. Reducibility among combinatorial problems. In R. E. Miller and J. W. Thatcher, 
editors, Complexity of computer computations, pages 85-103. Plenum Press, New York, 1972. 

[21] M. Krivelevich and D. Vilenchik. Solving random satisfiable 3CNF formulas in expected 
polynomial time. In SODA, pages 454-463, 2006. 

[22] L. Kucera. A generalized encryption scheme based on random graphs. In Gunther Schmidt 
and Rudolf Berghammer, editors, WG, volume 570 of Lecture Notes in Computer Science, 
pages 180-186. Springer, 1991. 

[23] L. Kucera. Expected complexity of graph partitioning problems. Discrete Applied Math., 
57:193-212, 1995. 

[24] C. McDiarmid. On the method of bounded differences. In Surveys in combinatorics, pages 
148-188. Cambridge University Press, 1989. 

[25] F. McSherry. Spectral partitioning of random graphs. In FOCS, pages 529-537, 2001. 

A Concentration inequalities 

Throughout the paper, we use the central limit theorem for binomial random variables, and its rate 
of convergence that was independently discovered by Berry in 1941 [8] and by Esseen in 1942 |10j . 
For details, see, for example [9j §Sec. 3.4.4]. 

Theorem A.l (Berry, Esseen). Let B(n,p) be a binomial random variable with parameters n,p. 
Then for every i£R 



Corollary A. 2. Let B{n,p) be a binomial random variable. For any a G R, the probability that 
B{n,p) is greater than pn + ayp{l — p)n is bounded by 



Theorem A. 3 (Hoeffding's Inequality). Let S = X\ + ■ ■ ■ + X n where the Xi 's are independent 
Bernoulli random variables. Then for every t > 



Corollary A. 4. Let A,B be two disjoint sets of vertices in G £ G(n,p) with \A\ = n\ and \B\ = n<i 
such that n\ < O (712) . Given a G R, define the random variable 




F(B(n,p) > pn + a^J p(l — pjn) 



□ 



P(|5-E5| >t)< 2e 



2t 2 /n 



X = \ {v G A : d B (v) > pn 2 + ay/p{\ - p)n 2 } 
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Then for every d > and < e < ^ it holds that 

F(\X - ¥(a)«i| > c'n}- £ ) < e"^ 1 " 25 / 2 . 
Proof. From Cor. IA.21 we know that <J>(a)ni — EX < c^= for some constant c > 0. Therefore 



by Thm. IA.31 for any constant d > 0, 

F(\X -$(a)m| > c'n{~ £ ) < P(|X - EX| > c'n}~ £ - c^) 

where the last inequality holds because < O^^JrTi) = o(n\~ £ ). □ 

B The G(n,p,k,q) case 

Lemma B.l (analogous to Lemma l2.4p . For every i > 0, the graph Gi defined the i'th iteration 
of the algorithm is a copy of G(hi,p,ki,q). 

Proof. The proof is identical to the proof of Lemma 12.41 □ 

Lemma B.2 (analogous to Lemma l2.6j) . For every < £i,£2 < \, the set V satisfies \ \V\ — rn\ < 
0(n x - £1 ) and \\VnK\- p'k\ < O^ 1 ^ 2 ) whp(e- @{ - nl ~ 2e ^ + e - e ( fcl ~ 2e2 ) j. 

Proof. Follows from Cor. IA.4l the same way as in the proof of Lemma 12.61 □ 



Lemma B.3 (analogous to Lemma l2.7p . Let G £ G(n,p,k,q) where k > c^n logn. Denote the 
hidden dense graph by K and the set of k largest degree vertices by M . Then 

P(\K \ Ml > 0) < e -(9-p) fc2 / 2 «-iogn-0(i) 
Proof. Define x = i(g — p)k. Then by Thm. IA.3I 

F(3v K : d(v) >pn + x) < nF(B(n,p) >pn + x) < nF(\B(n,p) - pn\ > x) < 2ne" (g " p)2fc2/2n . 
On the other hand, 

F(3v <E K : d{v) < pn + x) < kF(B(n - k,p) + B(k,q) - p(n - k) - qk < x - (q - p)k) 

< kF(\B(n - k,p) + B(k,q) - p(n - k) - qk\ > x) 

< 2ke~^ 2k2 / 2n . 

Therefore, the probability that there exist a vertex v K and a vertex u 6 K such that d(u) < d(v) 
is bounded by 2(n + k)e~^-^ 2k2 / 2n . □ 

Corollary B.4 (analogous to Cor. 12. 8p . // the algorithm does t iterations before finding K and 

2 

succeeds in every iteration, then whp(e~®^~^ K is a subset of the original hidden dense graph. 

2t , 2 

Proof. The proof is analogous to the proof of Cor. 12.81 by noticing that whp(e~ - ^ r'n every 
hidden dense graph vertex in Gt has at least — ^^)kt — o(kt) neighbors in K' and every non- 
hidden dense graph vertex in Gt has at most (p + 2 ^)^t + o(kt) neighbors in K' . □ 
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Lemma B.5. We are given a random graph G 6 G(n,p, k, q), and also a subset of the hidden dense 
graph K of size s. Denote the hidden dense graph in G by K . Suppose that either 

(a) k = O (log n log log n) and s > ( (q—p)$ + £ ) mn f or some £ > 0, or 

(b) k > uj (log n log log n) and s > ^z^p ^ nn + !• 

Let K' denote the set of vertices containing K and all the vertices in G that have at least \{p + q)s 
neighbors in K. Define K* to be the set of vertices of G that have at least ^(p + q)k neighbors in 
K'. Then for every < e 3 < \, w hp(&- &( - sXo ^ l °^ + e" ^ 1 " 2 " 3 ) ), K* = K. 

Proof. Consider an arbitrary subset S of K of size s. By Thm. IA.3( the probability that a specific 
vertex v ^ K has more than \ {p+q)s neighbors in S is bounded by e - ^ - ^ s l 2 . The probability that 
a specific vertex v € K has less than \{p + q)s neighbors in S is bounded by the same expression. 
Therefore, the probability of having at least Iq "bad" vertices (where "bad" is defined by either a 
vertex of K that is not in K' or a vertex not in K that is in K') is bounded by ^/=; n l e~^ q ~ p ^ sl ^ 2 . 
Taking union bound over all subsets of size s of K gives that the probability that there exists a 
subset with at least Iq bad vertices is bounded by 

n 

^.s e /(lnn-(g-p) 2 s/2) < ng s In k-l ((q-p) 2 s/2-ln n) _ n+s In k-l ((q-p) 2 s/2-\nn) 
l=l 

If we take lo = ^^^J^-xln ^ s probability is e - lnri - slnfc . Therefore, whp(e~ lnn_slnfc ) there are 
at most lo bad vertices in K'. Specifically, this implies that K' contains at least k — lo vertices 
from K and at most Iq vertices not from K, and that \K'\ < k + lo- By Thm. IA.3I and the union 
bound, the probability that there exists a vertex v G K with less than qk — k l ~ £ ' A neighbors in K 
is bounded by e -0 ^ £3 \ and so is the probability that there exists a vertex v K with more 
than pk + fc 1_£3 neighbors in K. Therefore, whp(e -0 ( fc £3 )) the number of neighbors every v £ K 
has in K' is at least qk — k 1 ~ £3 — lo, and the number of neighbors every v K has in K' is at 
most pk + k 1 "^ + l . Thus, if s and k are such that l = o{k) then whp( e - lnn - slnfe + e - e ( fcl_2e3 )) 
K* = K. □ 
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